Adaptive learning by extremal dynamics and negative feedback 
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We describe a mechanism for biological learning and adap- 
tation based on two simple principles: (I) Neuronal activ- 
ity propagates only through the network's strongest synap- 
tic connections (extremal dynamics), and (II) The strengths 
of active synapses are reduced if mistakes are made, other- 
wise no changes occur (negative feedback). The balancing of 
those two tendencies typically shapes a synaptic landscape 
with configurations which are barely stable, and therefore 
highly flexible. This allows for swift adaptation to new sit- 
uations. Recollection of past successes is achieved by pun- 
ishing synapses which have once participated in activity as- 
sociated with successful outputs much less than neurons that 
have never been successful. Despite its simplicity, the model 
can readily learn to solve complicated nonlinear tasks, even 
in the presence of noise. In particular, the learning time for 
the benchmark parity problem scales algebraically with the 
problem size N, with an exponent k ~ 1.4. 



I. INTRODUCTION 

In his seminal essay, "The Science of the Artificial" R] 
the economist Herbert Simon suggested that biological 
systems, including those involving humans, are "satisfic- 
ing" rather than optimising. The process of adaptation 
stops as soon as the result is deemed good enough, ir- 
respective of the possibility that a better solution might 
be achieved by further search. In reality, there is no way 
to find global optima in complex environments, so there 
is no alternative to accepting less than perfect solutions 
that happen to be within reach, as Ashby sustained in 
his "Design for a brain" 0]. We shall present results on 
a schematic "brain" model of self-organized learning and 
adaptation that operates using the principle of satisfic- 
ing. The individual parts of the system, called synaptic 
connections, are modified by a negative feedback process 
until the output is deemed satisfactory; then the process 
stops. There is no further reward to the system once an 
adequate result has been achieved: this is learning by a 
stick, not a carrot! The process starts up again as soon as 
the situation is deemed unsatisfactory, which could hap- 
pen, for instance, when the external conditions change. 
The negative signal may represent hunger, physical pain, 
itching, sex-drive, or some other unsatisfied physiological 
demand. 



Formally, our sceme is a reinforcement-learning algo- 
rithm (or rather de-inforcement learning since there is no 
positive feedback) , Q where the strengths of the elements 
are updated on the bases of the signal from an external 
critic, with the added twist that the elements (neuronal 
connections) do not respond to positive signals. 

Superficially, one might think that punishing unsuc- 
cessful neurons is the mirror equivalent to the usual Hcb- 
bian learning where successful connections are strength- 
ened |g]. This is not the case. The Hebbian process, 
like any other positive feedback, continues ad infinitum, 
in the absence of some ad hoc limitation. This will ren- 
der the successful synapse strong, and all other synapses 
relatively weak, whereas the negative feedback process 
employed here stops as soon as the correct response is 
reached. The successful synaptic connections are only 
barely stronger than unsuccessful ones. This makes it 
easy for the system to forget, at least temporarily, its 
response and adjust to a new situation when need be. 

The synaptic landscapes are quite different in the two 
cases /citeAraujo. Positive reinforcement leads to a 
few strong synapses in a background of weak synapses. 
Negative feedback leads to many connections of similar 
strength, and thus a very volatile, noncommittal struc- 
ture. Any positive feedback will limit the flexibility and 
hence the adaptability of the system. Of course, there 
may be instances where positive reinforcement takes 
place, in situations where hard-wired connection have to 
be constructed once and for all, without concern for later 
adaptation to new situations. 

The process is self-organized in the sense that no ex- 
ternal computation is needed. All components in the 
model can be thought of as representing known biological 
processes, where the updating of the states of synapses 
takes place only through local interactions, either with 
other neighboring neurons, or with extracellular signals 
transmitted simultaneously to all neurons. The process 
of suppressing synapses has actually been observed in the 
real brain and is known as long term depression, or LTD, 
but its role for actual brain function has been unclear 
g . We submit that depression of synaptic efficacy is the 
fundamental dynamic mechanism in learning and adap- 
tation, with LTP, the long term potentiation of synapses 
usually associated with Hebbian learning, playing a sec- 
ondary role. 

Although we did have the real brain in mind when 
setting up the model, it is certainly not a realistic rep- 



resentation of the overwhelming intricacies of the human 
brain. Its sole purpose is to demonstrate a general princi- 
ple that is likely to be at work, and which could perhaps 
lead to the construction of better artificial learning sys- 
tems. The model presented here is a "paper airplane", 
which indeed can fly but is completely inadequate to ex- 
plain the complexity of real airplanes. 

Most neural network modelling so far has been con- 
cerned with the artificial construction of memories, in the 
shape of robust input-output connections. The strengths 
of those connections are usually calculated by the use 
of mathematical algorithms, with no concern for the dy- 
namical biological processes that could possibly lead to 
their formation in a realistic "in vivo" situation. In the 
Hopfield model, memories are represented by energy min- 
ima in a spin-glass like model, where the couplings be- 
tween Ising spins represent synaptic strengths. If a new 
situation arises, the connection have to be recalculated 
from scratch. Similarly, the back-propagation algorithm 
underlying most commercial neural networks is a New- 
tonian optimization process that tunes the synaptic con- 
nections to maximize the overlap between the outputs 
produced by the network and the desired outputs, based 
on examples presented to the network. All of this may 
be good enough when dealing with engineering type of 
problems where biological reality is of no concern, but 
we believe that this modelling gives no insight into how 
real brain- like function might come about. 

Intelligent brain function requires not only the ability 
to store information, such as correct input output connec- 
tions. It is mandatory for the system to be able to adapt 
to new situations, and yet later to recall past experiences, 
in an ongoing dynamical process. The information stored 
in the brain reflects the entire history that it has expe- 
rienced, and can take advantage of that experience. Our 
model illustrates explicitly how this might take place. 

The extremal dynamics allows one to define an "active" 
level, representing the strength of synapses connecting 
currently firing neurons. The negative response assures 
that synapses that have been associated with good re- 
sponses in the past have strengths that are barely less 
than the active ones, and can readily be activated again 
by supprerssing the currently active synapses. 

The paper is organized as follows. The next section 
defines the general problem in the context of our ideas. 
The model to be studied can be defined for many dif- 
ferent geometries. In section III we review the layered 
version of the model uM, with a single hidden layer. It 
will be shown how the correct connections between in- 
puts are generated, and how new connections are formed 
when some of the output assignments change. In sec- 
tion IV we introduce selective punishment of neurons, 
such that synapses that have never been associated with 
correct outputs are punished much more severely than 
synapses that have once participated in the generation 
of a good output. It will be demonstrated how this al- 



lows for speedy recovery, and hierarchical storage, of old, 
good patterns. In multi-layered networks, and in random 
networks, recovery of old patterns takes place in terms of 
self-organized switches that direct the signal to the cor- 
rect output. Also, the robustness of the circuit towards 
noise will be demonstrated. 

Section V shows that the network can easily perform 
more complicated operations, such as the exclusive-OR 
(XOR) process, contrary to recent claims in the litera- 
ture pq |. It can even solve the much more complicated 
parity problem in an efficient way. In the parity prob- 
lem, the system has to decide whether the number of 
binary Is among N binary inputs is even or odd. In 
those problems, the system does not have to adapt to 
new situations, so the success is due to the volatility of 
the active responses, allowing for efficient search of state 
space without locking-in at spurious, incorrect, solutions. 
In the same section we show how the model can readily 
learn multi-step tasks, adapt to new multi-step tasks, 
and store old ones for later use, exactly as for the simple 
single step problems. Finally section VI contains a few 
succinct remarks about the most relevant points of this 
work. The simple programs that we have constructed 
can be down-loaded from our web-sites pq ]. For an in- 
depth discussion of the biological justification, we refer 
the readers to a recent article iflill . 



II. THE PROBLEM 
A. What is it that we wsh to model? 

Schematically, we envision intelligent brain function as 
follows: 

The brain is essentially a network of neurons connected 
with synapses. Some of these neurons are connected to 
inputs from which they receive information from the out- 
side world ||7|. The input neurons are connected with 
other neurons. If those neurons receive a sufficiently 
strong signal, they fire, thereby affecting more neurons, 
and so on. Eventually, an output signal acting on the out- 
side world is generated. All the neurons that fired in the 
entire process are "tagged" with some chemical for later 
identification ||^ . The action on the outside is deemed ei- 
ther good (satisfactory) or bad (not satisfactory) by the 
organism. If the output signal is satisfactory, no further 
action takes place. 

If, on the other hand, the signal is deemed unsatisfac- 
tory, a global feedback signal - a hormone, for instance 
- is fed to all neurons simultaneously. Although the sig- 
nal is broadcast democratically to all neurons, only the 
synapses that were previously tagged because they con- 
nected firing neurons react to the global signal. They 
will be suppressed, whether or not they were actually re- 
sponsible for the bad result. Later, this may lead to a 
different routing of the signals, so that a different output 



signal may be generated. The process is repeated until a 
satisfactory outcome is achieved, or, alternatively, until 
the negative feedback mechanism is turned off, i.e., the 
system gives up. In any case, after a while the tagging 
disappears. 

The time-scale for tagging is not related to the time- 
scale of transmitting signals in the brain but must be 
related to a time scale of events in the real outside world, 
such as a realistic time interval between starting to look 
for food (opening the refrigerator) and actually finding 
food and eating it. It is measured in minutes and hours 
rather than in milliseconds. 

All of this allows the brain to discover useful responses 
to inputs, to modify swiftly the synaptic connection when 
the external situation changes, since the active synapses 
are usually only barely stronger than some of the inac- 
tive ones. It is important to invoke a mechanism for low 
activity in order to selectively punish the synapses that 
are responsible for bad results. 

However, in order for the system to be able to recall 
past successes, which may become relevant again at a 
later point, it is important to store some memory in the 
neurons. In accordance with our general philosophy, we 
do not envision any strengthening of successful synapses. 
In order to achieve this, we invoke the principle of selec- 
tive punishment: neurons which have once been associ- 
ated with successful outputs are punished much less than 
neurons that have never been involved in good decisions. 
This yields some robustness for successful patterns with 
respect to noise, and also helps constructing a tool-box 
of neuronal patterns stored immediately below the active 
level, i. e. their inputs are slightly insufhcient to cause 
firing. This "forgiveness" also makes the system stable 
with respect to random noise - a good synapse that fires 
inadvertently because of occasional noise is not severely 
punished. Also, the extra feature of forgiveness allows 
for simple and efficient learning of sequential patterns, i. 
e. patterns where several specific consecutive steps have 
to be taken in order for the output to become success- 
ful, and thus avoid punishment. The correct last steps of 
will not be forgotten when the system is in the process 
of learning early steps. 

In the beginning of the life of the brain, all search must 
necessarily be arbitrary, and the selective. Darwinian, 
non-instructional nature of the process is evident. Later, 
however, a tool-box of useful connections has been build 
up, and most of the activity is associated with previously 
successful structures - the process appears to be more 
and more directional, since fewer and fewer mistakes are 
committed. 

Roughly speaking, the sole function of the brain is to 
get rid of irritating negative feedback signals by suppress- 
ing firing neurons, in the hope that better results may be 
achieved that way. A state of inactivity, or Nirvana, is 
the goal! A gloomy view of Life, indeed! The process 
is Darwinian, in the sense that unsuitable synapses are 



killed, or at least temporarily suppressed, until perhaps 
in a different situation they may play a more role. There 
is no direct "Lamarckian" learning by instruction, but 
only learning by negative selection. 

It is important to distinguish sharply between features 
that must be hardwired, i. e. genetically generated by 
the Darwinian evolutionary process, and features that 
have to be self-organized, i. e., generated by the intrin- 
sic dynamics of the model when subjected to external 
signals. Biology has to provide a set of more or less ran- 
domly connected neurons, and a mechanism by which an 
output is deemed unsatisfactory, a "Darwinian good se- 
lector" , transmitting a signal to all neurons (or at least 
to all neurons in a sector of the brain). It is absurd to 
speak of meaningful brain processes if the purpose is not 
defined in advance. The brain cannot learn to define 
what is good and what is bad. In our model this is given 
at the outset. Biology also must provide the chemical or 
molecular mechanisms by which the individual neurons 
react to this signal. From there on, the brain is on its 
own! There is no room for further ad hoc tinkering by 
"model builders" . We are allowed to play God, not Man! 

Of course, this is not necessarily a correct, and cer- 
tainly not a complete, description of the process of self- 
organized intelligent behaviour in the brain. However, we 
are able to construct a specific model that works exactly 
as described above, so the scenario is at least feasible. 



B. So how do we actually model all of this? 

Superficially, one would expect that the severe limi- 
tations impose by the requirements of self-organization 
will put us in a straight-jacket and make the perfor- 
mance poor. Surprisingly, it turns out that the resulting 
process is actually very efficient compared with non-self- 
organized processes such as back-propagation - in addi- 
tion to the fact that it executes a dynamical adaptation 
and memory process not performed by those networks at 
all. 

The amount of activity has to be sparse in order to 
solve the "credit (or rather blame) assignment" problem 
of identifying the neurons that were responsible for the 
poor result. If the activity is high, say 50% of all neu- 
rons are firing, then a significant fraction of synapses are 
punished at each time step, precluding any meaningful 
amount of organization and memory. One could accom- 
plish this by having a variable threshold, as in the work 
by Alstrom and Stassinopoulos |0] , and by Stassinopou- 
los and Bak |I2]. Here, we use instead "extremal dynam- 
ics" , as was introduced by Bak and Sneppen (BS) |jl3] in 
a simple model of evolution, where it resulted in a highly 
adaptive self-organized critical state. At each point in 
time, only a single neuron, namely the neuron with the 
largest input, fires. 



The basic idea is that at a critical state the susceptibil- 
ity is maximized, which translates into high adaptability. 
In our model, the specific state of the brain depends on 
the task to be learned, so perhaps it does not generally 
evolve to a strict critical state with power law avalanches 
etc. as in the BS model. Nevertheless, it always operate 
at a very sensitive state which adapts rapidly to changes 
in the demands imposed by the environment. 

This "winner take all" dynamics has support in well 
documented facts in neurophysiology. The mechanism 
of lateral inhibition could be the biological mechanism 
implementing extremal dynamics. The neuron with the 
highest input firing rate will first reach its threshold firing 
potential sending an inhibitory signal to the surrounding, 
competing neurons, for instance in the same layer, pre- 
venting them from firing. At the same time it sends an 
excitatory signal to other neurons downstream. In any 
case, there is no need to invoke a global search procedure, 
not allowed by the ground rules of self-organization, in or- 
der to implement the extremal dynamics. The extremal 
dynamics, in conjunction with the negative feedback, al- 
lows for efficient credit assignment. 

One way of visualizing the process is as follows. Imag- 
ine a pile of sand (or a river network, if you wish) . Sand 
is added at designated input sites, for instance at the top 
row. Tilt the pile until one grain of sand (extremal dy- 
namics) is toppling, thereby affecting one or more neigh- 
bors. We then tilt the pile again until another site top- 
ples, and so on. Eventually, a grain is falling off the 
bottom row. If this is the site that was deemed the cor- 
rect site for the given input, there are no modifications 
to the pile. However, if the output is incorrect, then a lot 
of sand is added along the path of falling grains, thereby 
tending to prevent repeat of the disastrous result. Even- 
tually the correct output might be reached. If the exter- 
nal conditions change, so that another output is correct, 
the sand, of course, will trickle down as before, but the 
old output is now deemed inappropriate. Since the path 
had just been successful, only a tiny amount of sand is 
added along the trail, preserving the path for possible 
later use. As the process continues, a complex landscape 
representing the past experiences, and thus representing 
the memory of the system, will be carved out. 



III. THE MODEL 

A. The simplest layered model 

In the simplest layered version, treated in details in 
Rcf. uM, the setup is as follows (Fig. m. There is a 
number of input cells, an intermediate layer of "hidden" 
neurons, and a layer of output neurons. Each of the input 
neurons, i is connected with each neuron in the middle 
layer, j, with synaptic strength w{ji) . Each hidden neu- 
ron, in turn, is connected with each output neuron, k 



with synaptic strength w{kj). Initially, all the connec- 
tion strengths are chosen to be random, say with uniform 
distribution between and 1. Each input signal consists 
(for the time being) of a single input neuron firing. For 
each input signal, a single output neuron represents the 
pre-assigned correct output signal, representing the state 
of the external world. The network must learn to connect 
each input with the proper output for any arbitrary set 
of assignments, called a map. The map could for instance 
assign each input neuron i to the output neuron with the 
same label. (In a realistic situation, the brain could re- 
ceive a signal that there is some itching at some part of 
the body, and an output causing the fingers to scratch 
at the proper place must be generated for the signal to 
stop). At each time step, we invoke "extremal dynam- 
ics" equivalent with a "winner take all" strategy: only 
the neuron connected with the largest synaptic strength 
to the currently firing neuron i will fire at the next time 
step. 

The entire dynamical process goes as follows: 

i) An input neuron i is chosen to be active. 

ii) The neuron jm in the middle layer which is con- 
nected with the input neuron with the largest w{ji) is 
firing. 

iii) Next, the output neuron km with the maximum 
w{kjm) is firing. 

iv) If the output k happens to be the desired one, 
nothing is done, 

v)otherwise, that is if the output is not correct, 
w{km,jm) and w{jmi) are both depressed by an amount 
5, which could either be a fixed amount, say 1, or a ran- 
dom number between and 1. 

vi) Go to i). Another random input neuron is chosen 
and the process is repeated. 

That is all! The constant 5 is the only parameter of the 
model, but since only relative values of synaptic strengths 
are important, it plays no role. If one finds it un-aesthetic 
that the values of the connections are always decreas- 
ing and never increasing, one could raise the values of 
all connections such that the value of the largest output 
synaptic strength for each neuron is 1. This has no effect 
on the dynamics. 




middle layer in order to prevent the different paths to 
interfere, and thus destroy connections that have already 
been correctly learned. 
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FIG. 1. Topology and notation for the three geometries of the 
model. A) The simplest layered model with input layer i, connected 
via synapses w{j i) to all nodes j in the middle layer, which, in turn, 
are connected to all output neurons k by synapses w{k j). B)The 
lattice version is similar to the layered case except that each node 
connects forward only with a few (three in this case) of the neurons 
in the adjacent layer. C) The random network has N neurons, i, 
each connected with ric other neurons j, with synaptic strengths 
w(j i) (only a couple are shown). Some of them, (n^) are preselected 
as input and some (rio) as output neurons. A maximum number 
(tf) of firing is allowed in order to reach the output. 

We imagine that the synapses w{km jm) and w{jm i) 
connecting all firing neurons are "tagged" by the activity, 
identifying them for possible subsequent punishment. In 
real life, the tagging must last long enough to ensure that 
the result of the process is available - the time-scale must 
match typical processes of the environment rather than 
firing rates in the brain. If a negative feedback is received 
all the synapses which were involved and therefore tagged 
are punished, whether or not they were responsible for 
the bad result. This is democratic but, of course, not 
fair. We cannot imagine a biologically reasonable mech- 
anism that permits identification of synapses for selective 
punishment (which could of course be more efficient) as 
is assumed in most neural network models. The use of 
extremal dynamics solves the crucial credit assignment 
problem, which has been a stumbling block in previous 
attempts to model self organized learning, in a simple 
and elegant way. 

Eventually, the system learns to wire each input to its 
correct output counterpart. The time that it takes is 
roughly equal to the time that a random search for each 
input would take. Of course, no general search process 
could in principle be faster [pi| in the absence of any 
pre-knowledge of the assignment of output neurons. It 
is important to have a large number of neurons in the 



FIG. 2. The time to learn a given task decreases when the num- 
ber of neurons in the middle layer is increased. Data points are 
averages from 1024 realizations. 

Figure ^ shows the results from a simulated layered 
system with 7 input and 7 output nodes, and a variable 
number of intermediate nodes. The task was simply to 
connect each input with one output node (it does not 
matter which one). In each step we check if the seven 
pre-established input-output pairs were learnt and com- 
pute over many realizations the average time to learn 
all input-output connections. The figure shows how the 
average learning time decreases with the number of hid- 
den neurons. More is better! Biologically, creating a 
large number of more or less identical neurons does not 
require more genetic information than creating a few, 
so it is cheap. On the other hand, the set-up will def- 
initely loose in a storage-capacity beauty contest with 
orthodox neural networks - that is the price to pay for 
self-organization! We are not allowed to engineer non- 
interfering paths - the system has to find them by itself. 

At this point all that we have created is a biologically 
motivated robot that can perform a random search pro- 
cedure that stops as soon as the correct result has been 
found. While this may not sound like much, we believe 
that it is a solid starting point for more advanced mod- 
elling. 

We now subject the model to a new input-output map. 
This reflects that the external realities of the organism 
have changed, so that what was good yesterday is not 
good any more. Food is to be found elsewhere today, and 
the system has to adapt. Some input-output connections 
may still be good, and the synapses connecting them are 
basically left alone. However, some outputs which were 
deemed correct yesterday are deemed wrong today, so 
the synapses that connected those will immediately be 



punished. A search process takes place as before in order 
to find the new correct connections. 
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multi-layers of neurons that are not fully connected with 
the neurons in the next layer is depicted in figure IB. 
Each neuron in the layer connects forward to three oth- 
ers in the next layer. The network operates in a very 
similar way: a firing neuron in one layer causes firing of 
the neuron with the largest connection to that neuron in 
the subsequent layer and so on, starting with the input 
neuron at the bottom. Only when the signal reaches the 
top output layer will all synapses in the firing chain be 
punished, by decreasing their strength by an amount S as 
before, if need be. Interestingly, the learning time does 
not increase as the number of layers increases. This is 
due to the "extremal dynamics" causing the speedy for- 
mation of robust "wires". In contrast, the learning time 
for back-propagation networks grows exponentially with 
the number of layers -this is one reason that one rarely 
sees backprop networks with more than one hidden layer. 



FIG. 3. Adaptation times for a sequence of 700 input-output 
maps. The number of unsuccessful attempts to generate the cor- 
rect input-output connections is shown (A random network geom- 
etry was chosen, but the result is similar for the other geometries 
considered.) 

Figure || shows the time sequence of the number of 
"wrong" input-output connections, i. e., which is a mea- 
sure of the re-learning time, when the system is subjected 
to a sequence of different input-output assignments. For 
each re-mapping, each input neuron has a new random 
output neuron assigned to it. In general, the re-learning 
time is roughly proportional to the number of new input- 
output assignments that have changed, in the limit of a 
very large number of intermediate neurons. If the num- 
ber of intermediate neurons is small, the re-learning time 
will be longer because of "path interference" between the 
connections. In a real world, one could imagine that the 
relative amount of changes that would occur from day to 
day is small and decreasing, so that the re-lcarning time 
becomes progressively lower. 

Suppose now that after a few new maps, we return to 
the original input-output assignment. Since the original 
successful synapses have been weakened, a new pathway 
has to be found from scratch. There is no memory of 
connections that were good in the past. The network 
can learn and adapt, but it can not remember responses 
that were good in the past. In sections 4 and 6 we shall 
introduce a simple remedy for that fundamental problem, 
which does not violate our basic philosophy of having no 
positive feedback. 



B. Lattice geometry 

The set-up discussed above can trivially be general- 
ized to include more intermediate layers. The case of 



C. Random geometry 

In addition to layered networks, one can study the pro- 
cess in a random network, which may represent an actual 
biological system better. Consider an architecture where 
each of n neuron is arbitrarily connected to a number 
ric of other neurons with synaptic strengths w{j i). A 
number of neurons (ui and Uo) are arbitrarily selected as 
input and output neurons, respectively. Again, output 
neurons are arbitrarily assigned to each input neuron. 
Initially, a single input neuron is firing. Using extremal 
dynamics, the neuron that is connected with the input 
neuron with the largest strength is then firing, and so on. 
If after a maximum number of firing events tf the correct 
output neuron has not been reached, all the synapses in 
the string of firing neurons are punished as before. Sum- 
marizing, the entire dynamical process is as follows: 

i) A single input neuron is chosen. 

ii) This neuron is connected randomly with several 
others, and the one which is connected with the largest 
synaptic strength fires. The procedure is repeated a pre- 
scribed maximum number of times tf, thereby creating 
and labelling a chain of firing neurons. 

iii) If, during that process, the correct output has not 
been reached, each synapse in the entire chain of firings 
is depressed an amount S. 

iv) If the correct output is achieved, there is no plastic 
modification of the neurons that fired. Go to i) 

A system with n — 200, rii — Uq — 5, Uc = 10 be- 
haves like the layered structure presented above (and is 
actually the one shown in the figure. This illustrates the 
striking development of an organized network structure 



even in the case where all initial connections are abso- 
lutely uncorrelated. The model creates wires connecting 
the correct outputs with the inputs, using the intermedi- 
ate neurons as stepping stones. 



IV. SELECTIVE PUNISHMENT AND 
REMEMBERING OLD SUCCESSES 

We observed that there was not much memory left the 
second time around, when an old assignment map was 
re-employed - the task had to be re-learned from scratch. 
This turns out to be much more than a nuisance, in par- 
ticular when the task was complicated, like in the case 
of a random network with many intermediate neurons, 
where the search became slow. 

We would like there to be some memory left from pre- 
vious successful experiences, so that the earlier efforts 
would not be completely wasted. 

There is an analogous situation in the immune sys- 
tem, where the lymphocytes can recognize an invader 
faster the second time around. The location and acti- 
vation of memory in biological systems is an important, 
but largely unresolved problem. Speaking about the im- 
mune system, it has in fact been suggested in a series of 
remarkable papers by Polly Matzinger that the immune 
system is only activated in the presence of "danger" Ml . 
This is the equivalent of our learning by mistakes. In 
fact, Matzinger realizes that the identification of danger 
has to be pre-programmed in the innate immune system, 
and must have evolved on a biological time scale- this is 
the equivalent of our "Darwinian good" (or rather "bad" , 
or "danger" selector or indicator that decides if the or- 
ganism is in a satisfactory state. 

It turns out |lj| that one single modification to the 
rules describedabove allows for some fundamental im- 
provements of the system's ability to recognize old pat- 
terns: 

iii a) When the output is wrong, a firing synapse that 
has at least once been successful is punished much less 
than a synapse that has never been successful. 

For instance, the punishment of the "good" synapse 
could be of the order of 10^2, compared with a depres- 
sion of order unity for a "bad" synapse. The neuron 
has earned some forgiveness due to its past good perfor- 
mance. Biologically, we envision that a neuron that does 
not receive a global feedback signal after firing, relaxes its 
susceptibility to a subsequent negative feedback signal by 
some chemical mechanism. It is important to realize that 
the synapse "knows" that it has been successful by the 
fact that it was not punished, so no non-local information 
is invoked. Note that we have not, and will not, include 
any positive Hebbian enforcement in order to implement 
memory in the system - only reduced punishment. 



We have applied this update scheme to both the lay- 
ered and the random version of the model. For the ran- 
dom model, we choose 200 intermediate neurons, plus 5 
designated input neurons and 5 output neurons. Each 
neuron was connected randomly with 10 other neurons. 
First, we arbitrarily assigned a correct output to each 
input, and ran the algorithm above, until the map had 
been learned. After unsuccessful firings, punishment was 
applied; an amount of 0.001 to previously successful neu- 
rons, and a random number between and 1 for those 
that had never been successful. Then we arbitrarily 
changed one input-output assignment, and repeated the 
learning scheme. A new random reassignment of a single 
input-output pair was introduced, and so on. 
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FIG. 4. Learning time for 700 adaptations for the random net- 
work with reduced punishment for successful synapses. Both plots 
show the same data, but in the inset the scale magnified to better 
illustrate the fast re-learning. The network has 5 inputs, 5 out- 
puts, and 200 intermediate neurons, each connected with 10 other 
neurons. 

In the beginning, the learning time is large, corre- 
sponding roughly to the time for a random search for 
each connection. New connections have to be discovered 
at each input-output assignment. However, after several 
switches, the time for adaptation becomes much shorter, 
of the order of a few time steps. Figure |j shows the 
time for adaptation for hundreds of consecutive input- 
output reassignments. The process becomes extremely 
fast compared with the initial learning time. Typically, 
the learning time is only 0-10 steps, compared with hun- 
dreds or thousands of steps in the initial learning phase. 
This is because any "new" input-output assignment is 
not really new, but has been imposed on the system be- 
fore. The entire process, in one particular run with 1000 
adaptations, involved a total of only 38 neurons out of 200 
intermediate neurons to create all possible input-output 
connections, and thus all possible maps. 

In order to understand this, it is useful to introduce the 
concept of the "active level" , which is simply the strength 



of the strongest synaptic output connection from the 
neuron which has just been selected by the extreme dy- 
namics. For simpUcity, and without changing the firing 
pattern whatsoever, we can normahze this strength to 
unity. The strengths of the other output synapses are 
thus below the active level. Whenever a previously suc- 
cessful input-output connection is deemed unsuccessful, 
the synapses are punished slightly, according to rule iii 
a), only until the point where a single synapse in the fir- 
ing chain is suppressed slightly below the active level de- 
fined by the extremal dynamics, thus barely breaking the 
input-output connection. Thus, connections that have 
been good in the past are located very close to the active 
level, and can readily be brought back to life again, by 
suppression of firing neurons at the active level if need 
be. 
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FIG. 5. Strengths of the synapses for small system with random 
connections, with 3 inputs, 3 outputs, and 20 intermediate neurons, 
each connected with 5 neurons. There are seven active synapses 
with strength 1, and several synapses with strengths just below the 
active level. Those synapses represent memories of past successes 
(such as the broken lines in Fig. pi) 

Figure |5| shows the synaptic landscape after several 
re-learning events for a small system with 3 inputs, 3 
outputs, and 20 neurons, each connected with 5 other 
neurons. The arrow indicates a synapse at the active 
level, i. e., a synapse that would lead to firing if its 
input neuron were firing. Altogether, there are 7 active 
synapses for that particular simulation, representing the 
correct learning of the current map. Note that there are 
many synaptic strengths just below the active level. The 
memory of past successes is located in those synapses! 

The single synapse that broke the input output con- 
nection serves as a self-organized switch, redirecting the 
firing pattern from one neuron chain to another, and 
consequently from one output to another. The adapta- 
tion process takes place by employing these self-organized 
switches, playing the roles of "hub neurons" , assuring 
that the correct output is reached. 





FIG. 6. a) Part of the network connecting a single output with 
the 5 possible inputs. The full line represents the active correct con- 
nection, and the broken lines represent synapses connecting with 
the other inputs. The strengths of those synapses are barely be- 
low the active level, b) Network connecting a single input with all 
possible outputs. The synapses marked with * act like switches, 
connecting the input with the correct output. 

Thus, when an input-output connection turns unsuc- 
cessful, all the neurons in the firing chain are suppressed 
slightly, and it is likely that an old successful connection 
re-appear at the active level. Perhaps that connection 
is also unsuccessful, and will be suppressed, and another 
previously successful connection may appear at the active 
level. The system sifts through old successful connections 
in order to look for new ones. 

Every now and then, there is some path interference, 
and re-learning takes longer time, indicated by the rare 
glitches of long adaptation times in Figure 0. Also, now 
and then previously unused synapses interfere, since the 
strength of the successful synapses slowly becomes lower 
and lower. Thus even when successful patterns between 
all possible input-output pairs have been established, the 
process of adaptation now and then changes the paths of 
the connections. 

Perhaps this mimics the process of thinking: 

"Thinking" is the process, where unsuccessful neuronal 
patterns are suppressed by some "hormonal" feedback 
mechanism, allowing old successful patterns to emerge. 
The brain sifts through old solutions until, perhaps, a 
good pattern emerges, and the process stops. If no suc- 
cessful pattern emerges, the brain panics: it has to search 
more or less randomly in order to establish a new, correct 
input-output connection. 

The input patterns do not change during the thinking 
process: one can think with closed eyes. 

Figure m, shows the entire part of the network which is 
involved with a single input neuron, allowing it to connect 
with all possible outputs. The full line indicates synapses 
at the active level, connecting the input with the correct 
output. The broken lines indicate synapses that connect 
the input with other outputs. They are located just be- 
low the active level. The neurons marked by an asterisk 
are switches, and are responsible for redirecting the flow. 

Similarly, Fig. gb shows all the synapses connecting a 



single output with all possible inputs. The neurons with 
the asterisks are "hub neurons" , directing several inputs 
to a common output. Once such neuron is firing, the 
output is recognized, correctly or incorrectly. A total of 
only 5 intermediate neurons are involved in connecting 
the output with all possible inputs. 

Note that short-term and long-term memories are not 
located at, or ever relocated to, different locations. They 
are represented by synapses that are more or less sup- 
pressed relative to the currently active level selected by 
the process of extremal dynamics, and can be reactivated 
through self-organized switches as described above. 

The system exhibits aging at a large time scale: even- 
tually all or most of the neurons will have been success- 
ful at one point or another, and the ability to selectively 
memorize good pattern disappears. The process is not 
stationary. If one does not like that, one can let the 
neurons die, and replaced by fresh neurons with random 
connections at a small rate. The death of neurons causes 
longer adaptation times now and then since new synaptic 
connections have to be located. 
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FIG. 7. Learning with noise. The diagram shows all the synap- 
tic connections allowing a single input neuron to connect with all 
possible output neurons. The full lines show the currently active 
path, and the numbers are the synaptic strengths as explained in 

text. 



A. Perfect learning with noise. 



perfect from then on. Thus, the system deals automat- 
ically with noise! Figure |j shows all the input-output 
connections for one input neuron in a simulation with 
three input neurons, three output neurons, and a total 
of 50 neurons each connected with 5 neurons. The noise 
level is 0.02, and the punishment of previously success- 
ful neurons is 0.002. The numbers are the strengths of 
the synapses. Note that the incorrect synapses connected 
with the switches are suppressed by a gap of at least 0.02 
- the level of the noise - below the correct ones. Note also 
that some of the incorrect synapses not connected with 
switches are much less suppressed. They are cut-off by 
switches elsewhere and need not be suppressed in order 
to have the signal directed to the correct output. 
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FIG. 8. Learning times. As Fig. W, but with uniform random 
noise of amplitude 0.02 added to the synaptic strengths. Note the 
increase in the adaptation times. 

The price to be paid in order to have perfect learn- 
ing with noise is that adaptation to new patterns takes 
longer, because the active synapses have to be suppressed 
further to give way for new synapses. Figure ^ shows the 
learning time for 700 successive re-mappings, as in Fig. 
0, but with noise added. Note that indeed adaptation is 
much slower. 



It is also interesting to consider the effect of noise. 
Suppose that a small noise n, randomly distributed in 
the interval < n < e, is added to the signal sent to 
the neurons. This may cause an occasional wrong out- 
put signal, triggered by synapses with strengths w(kmi) 
that were close to that of the correct one, i. e. the one 
that would be at the active level in the absence of noise. 
However, those synapses will now be suppressed, since 
they lead to an incorrect result. After a while, there will 
be no incorrect synapses w(fcmj) left, such that the ad- 
dition of the noise can cause it to exceed the strength of 
the correct synapse w{km.jm), so no further modifications 
will take place, and the input-output connections will be 



V. BEYOND SIMPLE WIRING: XOR AND 
SEQUENCES 

So far we have considered only simple input-output 
mappings where only a single input neuron was acti- 
vated. However, it is quite straightforward to consider 
more complicated patterns where several input neurons 
are firing at the same time. In the case of the layered 
network, we simply modify the rule ii) above for the se- 
lection of the firing neuron in the second layer as follows: 

ii b) The neuron jm in the middle layer for which the 
sum of the synaptic connections w{ji) with the active in- 



put neurons i is maximum is firing. 

For the random network one would modify the rule for 
the firing of the first intermediate neuron similarly. 



A. XOR operation 

Since the hey-days of Minsky and Papert pG] who 
demonstrated that only linearly separable functions can 
be represented by simple -one layer- perceptrons, the 
ability to perform the exclusive-or (XOR) operation has 
been considered a litmus test for the performance of any 
neural network. How does our network measure up to the 
test? Following Klemm et al. ||l^ we choose to include 
three input neurons, two of them representing the input 
bits for which we wish to perform the XOR operation, 
and a third input neuron which is always active. This 
bias assures that there is a non-zero input even when the 
response to the 00 bits is considered. The two possible 
outputs for the XOR operation determines that the net- 
work have two output neurons. 

The inputs are represented by a string of / binary units 
xi,...,xi, Xi e {0,1}. As explained in section 3, neu- 
rons are connected by weights w from each input (j) to 
each hidden (i) unit and from each hidden unit to each 
output(A:) unit. 

The dynamics of the network is defined by the following 
steps. One stimulus is randomly selected out of the four 
possible (i.e. ,001, 101, Oil, 111) and apphed to xi,X2,X3. 
Each hidden node j then receives a weighted input hj = 
Si=i '^ji^i- The state is chosen according to the winner- 
take-all rule i.e., the jm neuron with the largest hj fires (i. 
e., XjTn — 1. Since there is only one active intermediate 
neuron, the output neuron is chosen as before to be the 
one connected with that neuron by the largest strength 

Wkj- 

Adaptation to changing tasks is not of interest here, so 
we choose to simulate the simplest algorithm in section 
3 without any selective punishment). As shown in Fig. 
0, networks with the minimum number of intermediate 
neurons (three for this problem) are able to solve the task 
in as few as tens of presentations. Of course, networks 
with larger middle layers learn significantly faster, up to 
an asymptotic limit which for this problem is reached for 
about 20 nodes. 
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FIG. 9. Learning the XOR problem. The top panel shows the 
distribution of learning times for a middle layer with 20 nodes. The 
bottom panel shows the average (circles) and the mode (crosses) of 
the distribution of learning times (from 10^ realizations) for various 
sizes of the middle layer. 

Even in the present of noise, the tolerant version of the 
model presented above, and in our previous paper llj] 
allows for perfect, but slightly slower learning. Klemm 
et al introduced forgiveness in a slightly different, and 
much more elaborate way, by allowing the synapses a 
small number of mistakes before punishment. We do not 
see the advantage of this ad hoc scheme over our simpler 
original version, which also appear to be more feasible 
from a biological point of view. 

Indeed much harder problems of the same class as the 
XOR, can be learned by our network without any modi- 
fication. XOR belongs to the "parity" class of problem, 
where for a string of arbitrary length N there are 2^ 
realizations composed of all different combinations of 
and I's. In order to learn to solve the parity problem 
the system must be able to selectively respond to all the 
strings with odd (or even) number of I's (or zeros). The 
XOR function is the simplest case with N = 2. 

We used the same network as for the XOR problem, 
but now with increasing N up to string lengths of 6. For 
all cases we chose a relatively large intermediate layer 
with 3000 neurons. Figure |^ shows the results of these 
simulations. In panel A the mean error (calculated as in 
Klemm et al. for consistency) is the ratio between those 
realizations which have learned the complete task and 
those that have not, as a function of time. For each N, 
a total of 1024 realizations was simulated, each one ini- 
tiated from a different random configuration of weights. 
Notice that the time axis (for presentation purposes) is 
in logarithmic scale. At least for the sizes explored here, 
the network solves larger problems following not very ex- 
plosive power-law scaling relationship. Panel B of Fig. HG 
shows that learning time scales with problem size with an 
exponent k ^ 1.4. In conclusion, the nonlinearity does 
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not appear to introduce additional fundamental problems 
into our scheme. 
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FIG. 10. Learning nonlinear problems beyond XOR. Curves in 
panel A show the time dependence of average errors for increasingly 
harder parity functions, from order 2 (i.e., the XOR case) to order 
6. For each curve, the numbers indicate the size (/ = 2^) of the 
problem. In panel B the curves shown in A are re-plotted with the 
time axis rescaled with the size of the problem, i' = t/I^ . Good 
data collapse is achieved with with fe ~ 1.4 . 



B. Generalization and feature detection 

The general focus of most neural network studies has 
been on the ability of the network to generalize, i.e., to 
distinguish between classes of inputs requiring the same 
output. In general, the task of assigning an output to an 
input which has not been seen before is mathematically 
ill-defined, since in principle any arbitrary collection of 
inputs might be assigned to the same output. Practi- 
cally, one would like to map "similar" inputs to the same 
output; again "similar" is ill-defined. We believe that 
similarity is best defined in the context of (biological) 
utility: similar inputs are by definition inputs requiring 
the same reaction (output) on order to be successful (this 
is circular, of course). For a frog, things that fly requires 
it to stick its tongue out in the direction of the flying 
object, so all things that fly might be considered simi- 
lar; there is not much need for the frog to bother with 
things that don't fly. Actually, a frog can only react to 
things that move as demonstrated in the classical paper 
by Lettvin, Maturana, McCuUoch and Pitts |^ almost 
half a century ago. Roughly, the generalization problem 
can be reduced to the problem of identifying useful (or 
dangerous) features in the input that have consequences 
for the action that should be taken. 




FIG. 11. Two inputs, each representing two firing input cells, 
are considered. The two inputs have the input cell in the center 
in common. A) If the outputs should be the same, the common 
neuron is connected with the correct output neuron. B) If the 
outputs should bo different, the input neurons that are different 
are connected with the two different outputs. 

So how does our network learn to identify useful fea- 
tures in the input? Suppose (Fig. |ll|), that we present 
two different inputs to, for instance, the random net- 
work, one where input neurons 1 and 2 are firing, and 
another one where inputs 2 and 3 are firing. Consider 
the two cases A) where the output neuron for the two 
inputs should be the same, and B) where the assigned 
outputs are different. 

In the case where the outputs should be different, say, 
1, and 2, respectively the algorithm solves the problem 
by connecting the input 1 to 1 and the input 3 to the 
neuron 2 through different intermediate neurons, while 
ignoring the input 2. The brain identifies features in the 
input that are different. The irrelevant feature 2 is not 
even "seen" by our brain, since it have no internal repre- 
sentation in the form of firing intermediate neurons. In 
the case where the assigned outputs for the two inputs 
are the same, say 1, the problem is solved by connect- 
ing the common input neuron 1 with the output neuron 
with a single string of synaptic connections. The net- 
work identifies a feature that is the same for the two in- 
puts, while ignoring the irrelevant outputs 1 and 3, that 
are simply not registering in the brain. In a simulation, it 
was imposed that when inputs 1 or 3 were active without 
2 being active, success was achieved only if the output 
was not 1: the frog should not try to eat non-fiying ob- 
jects. This mechanism can supposedly be generalized to 
more complicated situations: depending on the task at 
hand, the brain identifies useful features that allows it 
to distinguish, or not to distinguish (generalize) between 
inputs. 

Suppose the system is subsequently presented to a pat- 
tern that in addition to the input neurons above includes 
more firing neurons. In case the additional neurons are 
irrelevant for the outcome, the system will take advan- 
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tage of the connections that have aheady been created 
and ignore the additional inputs. If some of the new in- 
puts are relevant, in the sense that a different output is 
required, further learning involving the new inputs will 
take place in order to allow the system to discriminate 
between outputs. We envision that this process of finer 
and finer discrimination between input classes allows for 
better and better generalization of inputs requiring iden- 
tical outputs. 

The important observation to keep in mind is that the 
concept of generalization is intimately connected with the 
desired function, and can not be pre-designed. We feel 
that, for instance with respect to theories of vision, there 
is an undue emphasis on constructing general pattern 
detection devises that are not based on the nature of 
the specific problem at hand. Whether edges, angles, 
contrasts, or whatever are the important feature must be 
learned, not hardwired. 



C. Learning multi-step sequences. 

In general, the brain has to perform several succes- 
sive tasks in order to achieve a successful result. For 
instance, in a game of chess or backgammon, the reward 
(or punishment) only takes place after the completion of 
several steps. The system can not "benefit" from imme- 
diate punishment following trivial intermediate steps, no 
matter how much the bad decisions contributed to the 
final poor result. 

Consider for simplicity a set-up where the system has 
to learn to present four successive outputs, 1, 2, 3, and 
4, following a single firing input neuron, 1. In general, 
the output decision at any intermediate step will affect 
the input at the next step. Suppose, for instance that in 
order to get from one place to another in a city starting 
at point 1, one first has to choose road 1 to get to point 
2, and then road 2 to go to point 3, and so on. Thus, the 
output represents the point reached by the action, which 
is then seen by the system and represents the following 
input. We represent this by feeding the output signal to 
the input at the next step. Thus, if output number 5 fires 
at an intermediate step, input neuron 5 + 1 = 6 will fire 
at the next step: this is the outer worlds reaction to our 
action. 

We will facilitate the learning process by presenting 
the system not only with the final problem, but also with 
the simpler intermediate problems: we randomly select 
an input neuron 1 to 4. If the neuron 4 is selected, the 
output neuron 4 must respond. Otherwise the firing neu- 
rons are punished. If the input neuron 3 is selected, the 
output neuron 3 must first fire. This creates an input sig- 
nal at input neuron 4. Then the output neuron 4 must 
fire. For any other combination, all the synapses partic- 
ipating in the two step operation are punished. In case 
the input 2 is presented, output neuron 2 must first fire, 



then output neuron 3, and finally output neuron 4 must 
fire, otherwise all synapses connecting firing neurons in 
the three step process are punished. When the input 1 is 
presented, the four output neurons must fire in the cor- 
rect sequence. Of course, we never evaluate or punish 
intermediate successes! 

For this to work properly, it is essential to employ the 
selective punishment scheme where neurons that have 
once participated in correct sequences are punished less 
than neurons that have never been successful, in order 
for the system to remember partially correct end games 
learned in the past. 

In one typical run, we choose a layered geometry with 
10 inputs, 10 outputs, and 20 intermediate neurons. Af- 
ter 4 time steps, the last step inputA -^ outputA was 
learned for the first time. After 35 time steps, the 
sequence inputi — > outputi{= inputs) — > 4 was also 
learned, after 57 steps the sequence input2 — > inputs -^ 
inputA -^ outputA was learned, and finally, after 67 steps 
the entire sequence had been learned. These results are 
typical. The brain learned the steps backwards, which, 
after all, is the only logical way of doing it. In chess, one 
has to learn that capturing the king is essential before 
the intermediate steps acquire any meaning! 

In order to imitate a changing environment, we may 
reassign one or more of the outputs to fire in the se- 
quence. As in the previous problems, the system will 
keep the parts that were correct, and learn the new seg- 
ments. Older sequences can be swiftly recalled. Finally 
we added uniform random noise of order 10^^ to the out- 
puts; this extended the learning time in the run above to 
193 time steps. 



VI. CONCLUSION AND OUTLOOK 

The employment of the simple principles produces a 
self-organized, robust and simple, biologically plausible 
model of learning. It is, however, important to keep 
in mind in which contexts these ideas do apply and in 
which they do not. The model discussed is supposed to 
represent a mechanism for biological learning, that a hy- 
pothetical organism could use in order to solve some of 
the tasks that must be carried out in order to secure its 
survival. On the other hand the model is not supposed 
to solve optimally any problem - real brains are not very 
good at that either. It seems illogical to seek to model 
brain function by constructing contraptions that can per- 
form tasks that real brains, such as ours, are quite poor 
at, such as solving the Travelling salesman Problem. The 
mechanism that we described is not intended to be opti- 
mal, just sufficient for survival. 

Extremal dynamics in the activity followed eventually 
by depression of only the active synapses results in pre- 
serving good synapses for a given job. In contrast to 
other learning schemes, the efficiency also scales as one 
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should expect from biology: bigger networks solve a given 
problem more efficiently than smaller networks. And all 
of this is obtained without having to specify the network's 
structure - the same principle works well in randomly 
connected, lattices or layered networks. 

In summary, the simple action of choosing the 
strongest and depressing the inadequate synapses leads 
to a permanent counter-balancing which can be analo- 
gous to a critical state in the sense that all states in the 
system are barely stable, or "minimally" stable using the 
jargon of ref. p^ . This peculiar met a- stability prevents 
the system from stagnating by locking into a single (ad- 
dictive) configuration from which it can be difficult to es- 
cape when novel conditions arise. This feature provides 
for flexible learning and un-learning, without having to 
specifically include an ad-hoc forgetting mechanism - it is 
already embedded as an integrated dynamical property of 
the system. When combined with selective punishment, 
the system can build-up a history-dependent tool box of 
responses that can be employed again in the future. 

Un-learning and flexible learning are ubiquitous fea- 
tures of animal learning as discussed recently by Wise 
and Murray p3| . We are not aware of any other simple 
learning scheme mastering this crucial ability. 



VII. ACKNOWLEDGEMENTS 

Work supported by the Danish Research Council SNF 
grant 9600585. The Santa Fe Institute receives funding 
from the John D. and Catherine T. MacArthur Founda- 
tion, the NSF (PHY9021437) and the U.S. Department 
of Energy (DE-FG03-94ER61951). 



S. Sutton, and C. W. Anderson, IEEE 
systems man and cybernetics 13,834 



[1] H. A. Simon, The Science of the Artificial (MIT Press, 

1996) 
[2] A. G. Barto, R. 

ransactions on 

(1983). 
[3] W. Ross Ashby Design for a brain. The origin of adaptive 

behaviour (John Wiley and Sons, Inc. New York, 1952). 
[4] In a recent paper "Function and form in networks 

of interacting agents" Tanya Araujo and R. Vilela 

Mendes analyze the two schemes and explicitly points 

out the d ifferences in terms of adaptability a nd 

robustness. ( |ittp://xyz. lanI.gov/abs/nlin.AO/0009018 ) 
[5] D. O. Hebb, The organization of behaviour. (Wiley, New 

York, 1949). 
[6] C. A. Barnes, A. Baranyi, L. J. Bindman, Y. Dudal, 

Y. Fregnac, M. Ito, T. Knopfel, S. G. Lisberger, R. G. 

M. Morris, M. Moulins, J. A. Movshon, W. Singer, L. 

R. Squirre. Group report: Relating activity-dependent 

modifications of neuronal function to changes in neural 



[7] 

[8] 
[9] 

[10] 

[11] 
[12] 

[13] 

[14] 
[15] 

[16] 
[17] 



[18] 



systems and behaviour. In Cellular and Molecular Mech- 
anisms underlying Higher Neural Functions. 81-79. A. I. 
Selverston and P. Ascher, (eds); (John Wiley and Sons 
Ltd, New York, 1994) 

The "outside world" could be other parts of the organism, 
or even other parts of the brain. 

U . Frey and R. G. M. Morris, Nature 385, 533-536(1997) 
G. M . Edelman Neural Darwinisn: The theory of neu- 
ronal group selection. (New York: Basic Books. 1987) 
R. M. Fitzsimonds, H-J Song, and M-M Poo, Nature, 
388, 443-448 (1997). 

P. Alstr0m and D. Stassinopoulos, Phys. Rev. E 51, 5028- 
5033 (1995) 

D. Stassinopoulos and P. Bak, Phys. Rev. E 51, 
5033(1995). 

P. Bak and K. Sneppen, Phys. Rev. Lett. 71, 4083-4087 
(1993). 

D. R. Chialvo and P. Bak, Neuroscience, 90, 1137 (1999). 
K. Klemm, S. Bornholdt, H. G. Schuster, Beyond Hebb: 
XOR and biological learning, Phys. Rev. Lett. 84, 1813- 
1817 (2000). 
From URL littp://www.santafe.edu'-^dchialvo/ or 



http://www.ma.ic.ac.uk 
P. Bak, C. Tang, and K. Wiesenfeld, Phys. Rev. Lett. 
59,381 (1987); Phys. Rev. A. 38, 364 (1988); for a re- 
view see P. Bak, How Nature Works: The Science of 
Self-Orgamzed Criticality, (Copernicus, New York, 1996; 
Oxford University Press, Oxford, 1997). 
S. Boettcher and A. G. Percus, Extremal Optimiza- 
tion: Methods derived from Co-Evolution. In GECCO- 
99: Proceedings of the Genetic and Evolutionary Com- 
putation Conference (Morgan Kaufmann, San Fr an- 

at 



Cisco, 1999), 825-83 2. See also |math.OC/9904056 
http://xxx.lanl.gOv/l; Nature's Way of Optimizing, cond- 



mat/9901351 at |http://xxx. lanl.gov/ 



[19] P. Matzinger, Seminars in Immunology, 10, 399 (1998); 
Nature, 369, 602 (1994). 

[20] M. L. Minsky and S. A. Papert. Perceptrons. An Intro- 
duction to Computational Ceometry. (MIT Press. Cam- 
bridge. MA, 1969). 

[21] D. H. Wolpert and W. G Mcready, IEEE Transactions 
on Evolutionary Computations, 1 1, 67-82(1997). 

[22] J. Y. Lettvin, H. R. Maturana, W. S. McCuUoch and 
W. H. Pitts, "What the frog's eye tells the frog's brain", 
Proceedings of the Institute of Radio Engineers 47, 1940- 
1951 (1959). 

[23] S. P. Wise and E. A. Murray, Arbitrary associations 
among antecedents and action. Trends in Neurosciences, 
23, 271 (2000). 



13 



